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Abstract 

We consider simultaneously identilying the membership and locations of point sources that are con¬ 
volved with different band-limited point spread functions, from the observation of their superpositions. 
This problem arises in three-dimensional super-resolution single-molecule imaging, neural spike sorting, 
multi-user channel identification, among other applications. We propose a novel algorithm, based on 
convex programming, and establish its near-optimal performance guarantee for exact recovery in the 
noise-free setting by exploiting the spectral sparsity of the point source models as well as the incoherence 
between point spread functions. Furthermore, robustness of the recovery algorithm in the presence of 
bounded noise is also established. Numerical examples are provided to demonstrate the effectiveness of 
the proposed approach. 
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1 Introduction 

In many emerging applications in applied science and engineering, the acquired signal at the sensor can be 
regarded as a noisy snperposition of returns from multiple modalities, where the return from each modality 
is a band-limited observation of a point source signal captured through a low-pass point spread function, 
governed by either the underlying physical field or the system design. Mathematically, we consider the 
following parametric mixture model of the acquired signal, y{t), given as 

I I / Ki 

y{t) = X * 9i{i) + w{t) = X ( X 

i—1 i—1 \/c—1 

where * denotes the convolution operator, w{t) is an additive noise, and I is the total number of modalities. 
Moreover, 

Ki 

^ii^) = - Tik) 

k^l 

is the point source signal observed from the ith modality, and gi{t) is the corresponding point spread 
function. For the fth modality, let Tik S [0,1) and atk G C be the location and the amplitude of the kth 
point source, 1 < k < Ki, respectively, where the locations of point sources rife’s are continuous-valued and 
can lie anywhere in the parameter space, at nature’s will. The point source model can be used to model 
a variety of physical phenomena occurring in a wide range of practical problems, such as the activation 
pattern of fluorescence in single-molecule imaging [1], sparse channel impulse response in multi-path fading 
environments, the locations of pollution plants in urban areas, firing times of neurons, and many more. 

‘This paper has been presented in part at 2015 International Symposium on Information Theory (ISIT) and 2015 Interna¬ 
tional Conference on Sampling Theory and Applications (SampTA). 


^+w(t), (1) 
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Our goal is to stably invert for the field parameters, i.e. the parameters of the point source models, of each 
modality from the acquired signal reflecting the ensemble behavior of all modalities, even in the presence 
of noise. This allows us to separate the contributions of each modality to the acquired signal. Moreover, 
typically we are interested in super resolution, i.e. resolving the parameters at a resolution much higher 
than the native resolution of the acquired signal, determined by the Rayleigh limit, or in other words, the 
reciprocal of the bandwidth of the point spread functions. 

1.1 Motivating Applications 

The mixture model (1) is motivated by the modeling and analysis of many practical problems, such as 
three-dimensional super-resolution single-molecule imaging [2, 3], spike sorting in neural recording [4, 5], 
multi-user multi-path channel identification [6, 7], and blind calibration of time-interleaved analog-to-digital 
converters [8, 9]. We describe several example applications below. 

Three-dimensional super-resolution single-molecule imaging: By employing photoswitchable fluo¬ 
rescent molecules, the imaging process of single-molecule microscopies (Stochastic Optical Reconstruction 
Microscopy (STORM) [1] or Photo Activated Localization Microscopy (PALM) [10]) is divided into many 
frames, where in each frame, a sparse number of fluorophores (point sources) are randomly activated, lo¬ 
calized at a resolution below the diffraction limit, and deactivated. The final image is thus obtained by 
superimposing the localization outcomes of all the frames. This principle can be extended to reconstruct a 
3-D object from 2-D image frames, for example, by introducing a cylindrical lens to modulate the ellipticity 
of the point spread function based on the depth of the fluorescent object in 3-D STORM [2]. Therefore, 
the acquired image in each frame can be regarded as a superposition of returns from multiple depth layers, 
where the return from each layer corresponds to the convolution outcome of the fluorophores in that depth 
layer with the depth-dependent point spread function, as modeled in (1). The goal is thus to recover the 
locations and depth membership of each point source given the image frame. 

Spike sorting for neural recording: Neurons in the brain communicate by firing action potentials, i.e. 
spikes, and it is possible to capture their communications through the use of a microelectrode, which records 
simultaneous activities of multiple neurons within a local neighborhood. Spike sorting [11], thus, refers to 
the grouping of spikes according to each neuron, from the recording of the microelectrode. Interestingly, it 
is possible to model the spike fired by each neuron with a characteristic shape [12] . The neural recording 
can thus be modeled as a superposition of returns from multiple neurons, as in (1), where the return from 
each neuron corresponds to the convolution of its characteristic spike shape with the sequence of its firing 
times. A similar problem also arises in DNA sequencing, please refer to [13]. 

Multi-path identification in random-access channels: In multi-user multiple access model [7], each 
active user transmits a signature waveform modulated via a signature sequence, which can be designed to 
optimize performance and the base station receives a superposition of returns from active users, as in (1), 
where the received signal from each active user corresponds to the convolution of its signature waveform 
with the unknown sparse multi-path channel from the user to the base station. The goal is to identify the 
set of active users, as well as their channel states, from the received signal at the base station. 

1.2 Related Work and Our Contributions 

There is an extensive research literature [14] on inverting (1) when there is only a single modality with 
1 = 1, where conventional approaches for parameter estimation such as matched filtering, MUSIC [15], 
matrix pencil [16], to more recent approaches based on the trigonometric polynomial frame [17] or total 
variation minimization [18], can be applied. However, these approaches can not be applied directly when 
multiple modalities exist in the observed signal, due to the mutual interference. To the best of the authors’ 
knowledge, methods for inverting (1) with multiple modalities have been extremely limited. Sparse recovery 
algorithms have been proposed to estimate the mixture model in [19, 6, 7[ with a discretized set of delays, but 
the performance may degenerate when the actual delays do not belong to the discrete grid [20] . Even when all 
the point sources indeed lie on the grid, existing work suggests that the sample complexity, or the bandwidth 
of the acquire signal, may have to grow logarithmically with the size of the grid, which is undesirable. More 
recently, [4, 5[ have proposed heuristic sparse recovery algorithms to estimate the continuous-valued delays in 
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the mixture model for spike sorting, however no performance guarantees are available. Finally, an algebraic 
approach has been proposed in [8] , but it is sensitive to noise due to the nature of the employed root-finding 
procedure and does not extend well to a large number of modalities due to the prohibitive sample complexity. 

In this paper, we study the problem of super-resolving the mixture model (1) when there are two modali¬ 
ties, i.e. 1 = 2. The methodology in this paper can be extended straightforwardly to the analysis of the case 
/ > 2 and is left for future work. We start by recognizing that in the Fourier domain, the observed signal 
can be regarded as a linear combination of two spectrally-sparse signals, each composed of a small number 
of distinct complex sinusoids. The atomic norm [21, 22] of spectrally-sparse signals is developed and pro¬ 
posed as an efficient convex optimization framework to motivate parsimonious structures [18, 21, 22, 23, 24] 
recently, which can be computed efficiently via semidehnite programming. We then separate and recover the 
two signals by motivating their spectral structures using atomic norm minimization, in addition to satisfying 
the observation constraints. The proposed algorithm, denoted by AtomicDemix, is reminiscent of the algo¬ 
rithms for sparse error correction [25] , robust principal component analysis [26] , demixing of sines and spikes 
[27, 24], and source separation [28], where one aims to separate two low-dimensional signals with incoherent 
structures via convex optimization. 

The separation and identification of the two point source signals, using the proposed AtomicDemix 
algorithm, is made possible with two additional natural conditions. The first condition is that the point 
source signal of each modality satisfies a mild separation condition, such that the locations of the point 
sources are separated by at least four times the Rayleigh limit; this is the same separation condition required 
by Candes and Fernandez-Granda [18] even with 7=1 when applying total variation minimization for 
super-resolution. The second condition is that the point spread functions of different modalities have to be 
sufficiently incoherent, which is supplied in our theoretical analysis by assuming they are randomly generated 
from a uniform distribution on the complex unit circle. Dehne 77max = max{77i, 772}- Our main results are 
summarized as below: 

• For the noise-free case, we demonstrate that, provided that the coefficients of the point sources have 
symmetric random signs, that is to say the signs of the coefficients of the point sources are ran¬ 
domly generated from a symmetric distribution on the complex unit circle, as soon as the number 
of measurements A7, or equivalently, the bandwidth of the point spread functions, is on the order 
M/\ogM = 0(77maxlog(77i -I- 772 )), AtomicDemix exactly recovers the point source model of each 
modality with high probability. Since at least an order of 0{Ki + K 2 ) measurements is necessary, our 
sample complexity is near-optimal up to logarithmic factors. When the coefficients of the point sources 
have arbitrary signs, we establish a similar performance guarantee with a higher sample complexity, 
on the order of M = 0(77^iax log(77i -|- K 2 )). 

• For the noisy case, when the coefficients of the point sources have arbitrary signs, under same conditions 
that guarantee exact recovery in the noise-free case, we establish that AtomicDemix is stable in the 
presence of possibly adversarial bounded noise. 

• The point sources of each modality can be localized from the dual solution of the proposed algorithms, 
without estimating or knowing the model order a priori. Numerical examples are provided to corrob¬ 
orate the theoretical analysis, with comparisons against the standard Cramer-Rao Bound (CRB) for 
parameter estimation. 

1.3 Organization and Notations 

The rest of this paper is organized as follows. We specify the problem formulation and main results in 
Section 2. Numerical experiments are provided to corroborate the theoretical analysis in Section 3. Section 4 
and Section 5 provide detailed proof procedures of our main results for the noise-free case and the noisy case, 
respectively. Finally, the paper is concluded in Section 6 with discussions on extensions and future work. 

Throughout the paper, (•)^ and (•)^ denote the transpose and Hermitian transpose, respectively, and 
(•) denotes the (element-wise) conjugate of a complex scalar or vector. For a function /(r) with variable 
T, we denote its first-order derivative and second-order derivative by /'(r) and /”(t), respectively. We also 
use /*'^^(t) to represent its Zth-order derivative. The quantity \/—l is denoted by j. Besides, we use C with 
different superscripts and subscripts to represent constants, whose values may change from line to line. 
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2 Problem Formulation and Main Results 

2.1 Observation Model 


Due to hardware and physical limits, the resolution of the sensor suite is limited by the diffraction limit or 
Rayleigh limit, which heuristically is often referred to as half the width of the mainlobe of gi{tys. Alter¬ 
natively, in the frequency domain, we say gi{tys are band-limited with cut-off frequency 2M. Denote the 
discrete-time Fourier transform of gi (t) as 


9i,n 




( 2 ) 


then gi^n = 0 whenever n Hm = {—2M,..., 0,..., 2M}. Taking the discrete-time Fourier transform of 
(1), the measurements can be represented as, in the Fourier domain. 


Ki 


= X! ^ aikC \ +yjn, uG Dm, 


2=1 


\k^l 


where the noise Wn is 


Wn = 


/ oo 

w{t)e~^'^^'^*dt, n G Dm- 

-OO 

When 1 = 2, the measurements (3) in the Fourier domain can be equivalently formulated as 


(3) 


Un — ' 




-j2irnTifc 


92,n 


/ K2 

E 

\k=l 


a2fce 


-j2TTnT2h 


€ ^ 


M- 


(4) 


The measurements ?/„’s in (4) can be considered as a linear combination of two spectrally-sparse signals, 
with determining the combination coefficients. In vector form, we have 


y = giQx\+ g2Qx2+w, (5) 

where y = [y_ 2 M, ■ ■ ■, yo, • ■ •, y 2 Mf, w = [?c_ 2 m, W 2 m]'^ , gi = [g^,- 2 M, • • ■, ffi.o, • • ■, gi,2MV for 

i = 1,2, and 0 denotes the Hadamard element-wise product operator. Furthermore, let G and 

X 2 G denote two spectrally-sparse signals, each composed of a small number of distinct complex 

harmonics, represented as 

Ki K 2 

£Ci = ^aifec(Tifc), and £C2 = E ('r2fc), ( 6 ) 

fc=l /c=l 


where Ki is the spectral sparsity of x* and K 2 is the spectral sparsity of x^. 

T 


c{t) = 


„-j2ir(-2M)r -j2iv{2M)T 


The atom c (r) is defined as 


which corresponds to a point source at the location r G [0,1). Further denote the location set of point 
sources in x* and by Ti = {rn,... and T 2 = {t 2 i, ... ,T 2 K 2 }, respectively. The goal is thus to 

recover Ti and T 2 , and their corresponding amplitudes, from the observation (5). 

Intuitively, it is impossible to separate the two modalities if gi and 92 are highly coherent. In this paper, 
we assume the entries of the point spread functions gi,nS are i.i.d. generated from a uniform distribution 
on the complex unit circle. This randomness assumption is reasonable when gi^ffs can be designed, such 
as the spreading sequences in multi-user communications, and provides the incoherence between different 
modalities that is necessary for separation. Multiplying both sides of (4) with gi,n, and with slight abuse of 
notation, (5) can be rewritten as 

y = xl + gQx^ + w gC^^+\ (7) 

where g = [g-2M, ■ ■ ■ ,5o, • ■ • : 92 m]^ G with = y 2 ,nfli,n uniformly drawn from the unit complex 

circle. In the noisy case, we consider the scenario where w is bounded as ||tu ||2 < crj- 
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2.2 AtomicDemix — A Convex Programming for Demixing 

Define the atomic norm [21, 22, 23] of a; G with respect to the atoms c(t) as 


inf 

afcgC,rfcG[0,l) 


\ak\ X = y^afcc(rfc) 


which can be regarded as the tightest convex relaxation of counting the smallest number of atoms c(t) that 
is needed to represent a signal x. Therefore, we seek to recover the signals Xi and X 2 by motivating their 
spectral sparsity via minimizing the sum of their atomic norms, with respect to the observation constraint 
in the noise-free case where w = 0: 


{Ai,£ 2 } = argmin||a;i||^-k ||a; 2 |U, s.t. y = Xi+gQx 2 . (8) 

Xi,X2 

In the noisy case, we propose a regularized atomic norm minimization algorithm as 

1 2 

{xi,X 2 } = argmin- ||y - Xi - g0 £C 2||2 + (lla^ill,^ -k ||a; 2 lU), (9) 

351,032 ^ 

where is the regularization parameter to balance the data fitting term and the structural promoting term, 
to be determined later. The above algorithms are referred to as AtomicDemix. Interestingly, the atomic 
norm ||£Ci|l.A can be equivalently characterized via semidefinite programming [23], therefore the proposed 
algorithms can be solved efficiently using off-the-shelf solvers. 


2.3 Performance Guarantee in the Noise-free Case 

Recall Rmax = max{isri, K 2 }. Define the separation of the point source signal of the ith modality as 

Aj = min |Tjfc - Tit I, (10) 

k^t 


which is understood as the wrapped-around distance on [0,1), and the minimum separation of the point 
source signals of all modalities as A = mint At. We have the following performance guarantee for the 
noise-free algorithm (8), whose proof is provided in Section 4. 

Theorem 2.1 (Noise-free Case). Assume that gn = ’s are i.i.d. randomly generated from a uniform 

distribution on the complex unit circle with ~ W[0,1], and that the minimum separation satisfies A > 1/M. 
Let g G (0,1), then there exists a numerical constant C such that 


M > C max < log 


M{Ki+K2) 


:l0g 


M {Ki + K 2 ) 


• ^max log 


K 1 +K 2 

V 


( 11 ) 


is sufficient to guarantee that x\ and X 2 are the unique solutions of (8) with probability at least 1 — g. 

Moreover, if the signs of the coefficients Oik’s are i.i.d. generated from a symmetric distribution on the 
complex unit circle, there exists a numerical constant C such that 


M >C max < log 


M (Ad -k Ad) 


, Rmax log 


Ad -k K2 

V 


log 


M {Ki + K 2 ) 


( 12 ) 


is sufficient to guarantee that a:* and X 2 are the unique solutions of (8) with probability at least 1 — g. 


Theorem 2.1 provides two sample complexities depending on whether the signs of the coefficients Oifc’s are 
random. Given random signs of Oi^’s, Theorem 2.1 indicates that as soon as the number of measurements M 
is on the order M/logM = 0(A'max log(Ari -|- Ar 2 )), AtomicDemix exactly recovers the point source models 
with high probability. This suggests that the performance of AtomicDemix is near-optimal in terms of the 
sample complexity as at least 0(Ari -|-Ar 2 ) measurements are necessary to identify the unknown parameters. 
Without requiring random signs of a^fe’s, the sample complexity is slightly higher, roughly dominated by the 
last term on the order of M = log(Ari -|- K 2 )). 
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Remark 1. The separation condition A > 1/M is a sufficient condition in Theorem 2.1 to guarantee accurate 
signal demixing, which is the same as the one required by Candes and Fernandez-Granda in [18] even with 
7 = 1. Our results suggest that the separation condition to achieve super resolution in mixture models is 
no stronger than that required even in the single modality case, provided the point spread functions are 
incoherent enough. It is implied in [18, 29[ that a reasonable separation is also necessary to guarantee stable 
super-resolution. Interestingly, no separation between point sources from dijferent modalities is required, as 
long as their point spread functions are incoherent enough. 

Remark 2. Theorem 2.1 assumes gn’s are i.i.d. from a uniform distribution on the complex unit circle, which 
may be relaxed as long as g^s are independently drawn from a distribution satisfying E [g„] = E ^ 

and Cl < l^nl < C 2 for some constants 0 < Ci < C 2 . Both sign(aife) and sign(a 2 fc) are assumed randomly 
generated, which are reasonable in many applications. 

Remark 3. Theorem 2.1 can also be extended into multi-dimensional point source models, following similar 
techniques in [30[, where the same order of measurements shall be sufficient to localize the point sources 
under similar mild separation conditions. We leave this extension to interested readers. 

2.4 Performance Guarantee in the Noisy Case 

In the presence of bounded noise, AtomicDemix in (9) still stably recovers the point source signals, as 
established in the following theorem, whose proof is provided in Section 5. 

Theorem 2.2 (Noisy Case). Let + 1, for some constant Cw > 1 large enough. Assume 

that gn = ’s are i.i.d. randomly generated from a uniform distribution on the complex unit circle with 

(pn ~ W[0,1], and that the minimum separation satisfies A > 1/M. Let 77 G (0,1), then as long as (11) holds 
for some constant C, the solution to (9) satisfies 

(11*1 - a^lll2 + 11*2 - 3 : 2112 ) < C'lO-^V^^maxlogM, (13) 

and 

1 logM\^^'‘ 

+ 1 + 9e^2)-{xl+gQ 3;*)||, < C2a. , (14) 

with probability at least 1 — r] — \ogM)~^f^, where Ci, C 2 and C 3 are some constants. 

Theorem 2.2 does not make any assumptions on the signs of the coefficients of point sources. It guarantees 
the stability for inversion in the presence of bounded noise, even when the noise is adversarially generated. 
When ( 7 u, = 0, Theorem 2.2 degenerates to the noise-free case, providing a performance guarantee of Atom¬ 
icDemix in accordance with Theorem 2.1 when the point sources have deterministic coefficients. The first 
bound (13) concerns signal reconstruction, which guarantees that one can stably separate Xi and X 2 even in 
the presence of noise. The second bound (14) concerns denoising, which guarantees that AtomicDemix can 
output a denoised signal y = Xi + g Q X 2 proportional to the noise level. 

2.5 Localization via Dual Polynomials 

With the demixing results Xi and X 2 , the source locations Tik’s of each signal can be estimated accurately by 
MUSIC [15[, ESPRIT [31[, the Prony’s method [32[ or other linear prediction methods. More interestingly, 
the source locations can be identified directly from the dual solutions of ( 8 ) and (9). The coefficients ai and 
02 can then be estimated by least-squares using the estimates of rife’s. 

We first characterize the dual problem of ( 8 ) and (9). Define the inner product of two vectors as 
(p, x) = x^p and the real-valued inner product as (p, x)-k. = Re {x^p ), where Re(-) takes the real part of a 
complex scaler. The dual norm of H-H^ can be represented as 

IIpII^ = sup {p,x)s,= sup |(p,c(t))|= sup 
\\x\\^<i Te[o,i) tg[o,i) 


2M 

n——2M 
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where p = [p_2M, ■ • ■ jPo, ■ • ■ ,P2m]^ ■ Then the dual problem of (8) can be written as 

p = argmax (p, y)R, s.t. ||p||^ < 1, ||p © p||^ < 1, (15) 

p 

whose derivations can be found in Appendix B. Similarly, by standard Lagrangian calculation the dual 
problem of (9) can be obtained as 

p = argmax i (||y||2-||y-A^ pII^) , s.t. ||p||^ < 1 , ||p 0 p||))^ < 1 . ( 16 ) 

Based on the definition of the dual norm, define the dual polynomials P (r) and Q (r) generated from 
the dual solutions of (15) or (16) as 

2M 2M 

P(r)= ^ ^ 

n=-2M n=-2M 


Then the source locations can be identified as 


Ti = 


{re [0,1): |p(r)|=l}, 


and To = 


{r e [0,1) : Q{t) = l| 


For the noise-free case, it is straightforward to show that Ti C and T 2 © T 2 whenever the optimal 
primal solution is in Appendix C. Note however in general both Ti and T 2 may contain spurious 

source locations. Interested readers can refer to relevant discussions in [23, Proposition 2.5] on when the 
dual polynomials return exact source locations, which also apply to our proposed algorithms with little 
modifications. 


3 Numerical Examples 

We carry out a series of numerical simulations to validate the performance of AtomicDemix in both noise-free 
and noisy cases under different parameter settings. 

3.1 Phase Transitions in the Noise-free Case 

We first examine the phase transition as a function of (ATi, K 2 ) for a fixed M. We vary the spectral sparsity 
levels of the two modalities as Ki and K 2 . For each pair of (Ki,K 2 ), we first randomly generate a pair of 
point sources Ti and T 2 that satisfy a separation condition A > 1/ (2M), with the coefficients of the point 
sources i.i.d. drawn from the complex standard Gaussian distribution. For each Monte Carlo trial, we then 
randomly generate the point spread functions g^s in the Fourier domain with i.i.d. entries drawn uniformly 
from the complex unit circle, and perform AtomicDemix by solving (8) using CVX [33]. The algorithm is 
considered successful when the normalized estimate error satisfies 11®* ~ II 2 / Il®i^ll 2 — 10“^. 

Fig. 1 shows the success rates of AtomicDemix over 20 Monte Carlo trials for each cell, when M = 8 
in (a) and M = 16 in (b), respectively. Fig. 2 (a) shows the success rates of AtomicDemix with respect to 
M for different values of Ki = K 2 , and Fig. 2 (b) shows the success rates of AtomicDemix with respect to 
Ki = K 2 for different values of M. 

3.2 Point Source Recovery from Dual Polynomials 

As described earlier, the locations of the point sources can be recovered from the dual solutions of the 
proposed algorithm. Fix M = 16, ATi = 4 and K 2 = 3. We randomly generate a pair of point sources that 
satisfy a separation condition A > 1/ (2M), with the coefficients of the point sources i.i.d. drawn from the 
complex standard Gaussian distribution. In the noise-free case, the amplitudes of the dual polynomials P (r) 
and Q (r) constructed from the solution of (15) are shown in Fig. 3 (a), superimposed on the ground truth, 
indicating the accurate recovery of the point sources. 
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(a) M = 8 



2 4 6 8 10 12 14 


(b) M = 16 


Figure 1 . Successful rates of AtomicDemix as a funetion of [Ki, K 2 ) when (a) M 


and (b) M = 16. 




Figure 2 . Success rates of AtomicDemix in the noise-free case (a) with respeet to M for various Ki = K 2 
and (b) with respect to Ki = K 2 for various M. 


We then consider the noisy case when the noise is composed of i.i.d. complex Gaussian entries CAf{0, cr^), 
and set Au, = (1 + log (4M+1) + V21oga + 2 + y^), where a = 8Tr{4M + 1) log(4M + 1) 

based on the discussions in [34, 35] or A^ = + 1 a/ 1.2 log (Stt (4M + 1) log (4M + 1)) for simplic¬ 

ity of use. The amplitudes of the dual polynomials P (r) and Q (r) are shown in Fig. 3 (b) and (c) 
for SNR = 16 dB and SNR = 5dB, respectively, where the Signal-to-Noise Ratio (SNR) is defined as 

SNR = lOlogj^g ^ ^ g jg clear that the source locations can be estimated stably from 

the dual solutions, and the performance degenerates gracefully with the increase of the noise level. 

3.3 Comparisons with CRB for Point Source Localization 

We further examine the performance of (9) on estimating the locations of the point sources from noisy 
measurements by comparing it against the CRB. Specifically, consider the special case with a single point 
source for each modality, by letting Ki = K 2 = 1. Denote the point source location in x* and by ti 
and T 2 , respectively. We assume the corresponding amplitude of each point source is known and unity when 
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(a) P (r) and Q (r), noise-free 




(b) P (t) and Q (r), SNR = 16dB 




(c) P (r) and Q (r), SNR = 5dB 


Figure 3 . Point source localization from dual polynomials (a) in absence of noise, (b) SNR =16dB, and (c) 
SNR = 5dB, for M = 16, Ki = 4 and K 2 = 3. 


computing the CRB for estimating ti and T 2 , which can be found as the diagonal entries of the inverse of 
the following Fisher information matrix: 


1 ( N 

J (ti,T2) = 


E ‘2M 
n— — 


2M ‘ 


Re 


Re 




/y^2M 2^ p-i27rn(ri-T2) 


E 2M 

n=-2M ' 


For each SNR, we randomly generate 200 noise realizations and compute the average squared estimate error 
(fi — Ti)^, where Ti is the dual solution of (9), i = 1,2. Fig. 4 shows the average squared estimate error in 
comparison with the CRB with respect to SNR when M = 10 in (a) and M = 16 in (b). The performance 
of parameter estimation shows a similar “thresholding effect” [36] as for conventional spectrum estimation 
algorithms, where the average squared estimate error approaches the CRB as soon as SNR is large enough. 
Moreover, as we increase M, the threshold SNR becomes smaller. Characterizing the exact threshold SNR 
for AtomicDemix is an interesting future research topic. 
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Figure 4 ■ The comparisons between the average sguared estimate error of point source localization and the 
corresponding CRB with respect to SNR, when (a) M = 10, (b) M = 16. 


4 Proof of Theorem 2.1 

In this section, we proceed to prove Theorem 2.1. We first provide the optimality conditions using dual 
polynomials to certify the optimality of the solution of (8). Illuminated by [18, 23], where the dual polynomial 
is constructed using the squared Fejer’s kernel, we propose a construction of dual polynomials which are 
composed of a deterministic term and a random perturbation term induced by the interference between 
modalities. Finally, we show that the constructed dual polynomials satisfy the optimality conditions with 
high probability when the sample complexity M is large enough. 

4.1 Optimality Conditions using Dual Polynomials 

We first certify the optimality of the primal problem (8) using the following proposition whose proof is in 
Appendix D. 

Proposition 1 . is the unique optimizer of (8) if there exists a vector p = [p-2M, ■ ■ ■ ,Po, ■ ■ ■ ,P2 m]'^ 

such that the dual polynomials P{t) and Q{t) constructed from it, represented as 

2M 2M 

P{r)= Q{t)= Y. Pn-gne^^^^^ (17) 

n^- 2 M n^- 2 M 

satisfy 

{ P(Tifc) = sign(aife), Vrifc G Ti 

Q (r 2 fe) = sign (a 2 fc), VT 2 fc G T 2 
|g(T)|<l, Vr^T2 

where the sign should be understood as the complex sign. 

4.2 Constructing the Dual Certificate 

Proposition 1 suggests that if we can find a vector p to construct two dual polynomials P(t) and Q{t) in 
(17) that satisfy (18), AtomicDemix is guaranteed to recover the ground truth. Our construction is inspired 
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by [18, 23], based on use of the squared Fejer’s kernel. However, since the two dual polynomials are coupled 
together, the construction is more involved. 

Define the squared Fejer’s kernel [18] as 


K{t) 


1 

M 


2M 


E 

n——2M 


Sn^ 


j2'KnT 


(19) 


where Sn = jg “ 1 10 -1) (^ “ 117 “ 10 -1) '^i^h |s„| < 1. The value of K (r) is nonnegative, 

attaining the peak at t = 0 and decaying to zero rapidly with the increase of |t|. 

We define two functions Kg (r) and Kg (r) respectively as 

2M 

Kg = E and Kg (r) 

n=-2M 

We then construct two polynomials P (r) and Q (t) as 

Ki Ki K2 K2 

E* (r) = ^ aikK (r - nk) + ^ PikK' (r - rik) + ^ Oi 2 kKg (r - T 2 k) + ^ P 2 kKg (r - T 2 k ), ( 21 ) 

k^l k^l k^l k^l 

and 

Ki Ki K2 K2 

Q ('^) = E “ Dfc) + ^ l3ikKg (r - Tifc) + ^ a 2 kK (r - r 2 fe) + E (t - T 2 k), ( 22 ) 

fe=i fe=i fe=i fc=i 

where th, G Ti and T 2 k £ T 2 . It is straightforward to validate that there exists a corresponding vector p such 
that (21) and (22) can be equivalently written in the form of (17). Set the coefficients at = [an ,..., 

A = [Ai, ■ • ■, PiKi] , for f = 1 , 2 by solving the following equations: 

'F(Tifc) = sign(aife), 

^ P' (rife) = 0, 

Q (r 2 fe) = sign (a 2 fe), 

.Q' (r 2 fc) = 0 , 

The above setting, if exists, immediately satisfies the first and third conditions in (18). The rest of the 
proof is then to, under the condition of Theorem 2.1, guarantee that a solution of (23) exists with high 
probability, and moreover, when existing, the solution satisfies the second and forth conditions in (18) with 
high probability, therefore completing the proof. 

Example 1. Before proceeding, we demonstrate the above dual polynomial construction by an example. Set 
M = 32. Let Ki = 4 and K 2 = 6 . We randomly generate the source locations Ti and T 2 each satisfying the 
separation A > 1/M. The amplitudes of the constructed P (r) and Q (r) are shown in Fig. 5, which indeed 
satisfy all the conditions in (18). 


rife G Ti, 

rife G Ti, 
T2k G T 2 , 
T2k G T 2 . 


(23) 


2 M 

= M E (20) 

n——2M 


4.3 Invertibility of (23) 


We hrst show that the solution of (23) exists with high probability in this subsection. Let 

Ui = [sign(aii),..., sign (a^icJ]"^ , 


for i = 1,2. Rewrite (23) into a matrix form as 


Wi 


10 
1 

1 


^11 

- Wgl 

^ W,: 


I if" (0)1 *'*'92 ^|if"(o)| 


\/\K"{0)\^3l 

W20 


w, 


21 


~\K"{0)\ 

1 

~\K"m 


Wgl- 

Wg2 

W 2 I 

W 22 




Ml 

VIK" (0)|/3i 


0 

0 L 2 


M2 

[./wpwm 


0 
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( 24 ) 

































(a) \P{r)\ (b) \Q{t)\ 

Figure 5 . The absolute values of the constructed dual polynomials \P (t) \ and |Q(t)| following (23) with 
respect to t € [0,1). 


where K"{0) is a scaler, defined as 


K" (0) = --TT^ (M^ - 1) . 
o 


(25) 


The entries of Wu £ Wg, £ W-gi £ and Ws* £ t = 0,1, 2, are specified 

respectively as 

Wu (I, k) = (m - Tifc), Wg, {I, k) = Kf (nz - T2fc), 

Wg, (/, k) = Kf {t21 - Tlfc) , W2. (/, k) = {t 21 - T2k) ■ 

For simplicity, we further introduce the following notations: 


Wi = 




o' 

> 

0 

1 

1 _ 

, W,= 

Wgo -W„i 1 

1 T;r/ . 1 \\r „ 

L Vi^PWi 



1 fxr . 1 ixr ^ 

, W2 = 

\ W20 r-^ -W21I 

V|K"( 0 )| 

1 ^xr^ 1 





and W = 


Wl Wg 
Wg W 2 


. Moreover, we have Wg = Wg . The diagonal blocks Wi of W are deterministic and 


well-conditioned if the separation A is not so small. This is formalized in the following proposition. 

Proposition 2. [23, Proposition 4-1] Suppose A > 1/M, then both Wi and W2 are invertible and satisfy 
the following 


ll-f- Will < 0.3623, 

IIWIl < 1.3623, 

\\W-^\\ < 1.568, 

for i = 1,2, where H-H represents the matrix operator norm. 

The off-diagonal block Wg is a random matrix with respect to g, which can be written as 

2M 2M 

M 


(26) 

(27) 

(28) 


^ 2M 2M 

^ Sng„ei (n) ef (n) = ^ E^, 


(29) 


-i=-2M 


t=-2M 
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where 


ei (n) = 


g427rnTii 


g427rnT2i 

g42'n-nTi2 


g427rnT22 

gj27rnriK^ 


^j2TrnT2K2 

j27rn g 72 ii-nT,, 

G , 62 (n) = 

j2-Kn ^'i2'KnT^?^ 

y/\K"m 

j27zn ^727rnri9 

y/\K"m 

j2-irn ^i 2 irnT 22 

y/\K”m 


VI ^''(0)1 

jl-Kn ^i2'iTnTM<, 


j2nn ^i 2 TrnT 2 K^ 

L \/i^"(o)i J 


L \/i^"(o)r J 


e 


(30) 


and 


En = j^Sngnei (n) ef (n) 


(31) 


is a zero-mean random matrix with E [En] = jgS„E [gn] ei (n) e|^ (n) = 0 since E [gn\ = E = 0. We 

have Wg is a sum of independent zero-mean random matrices with E [Wg] = 0. The following proposition 
establishes the spectral norm of Wg is bounded with high probability, whose proof is given in Appendix E. 


Proposition 3. Assume M > 4. Let S € (0,0.6376) and rj € (0,1), then ’ 

'2(Ki+if2)' 


46 

M > -^ATmax 


log 


II > ^} < ry provided that 
(32) 


Denote the event Es = {||VEg|| < 5}, which holds with probability at least 1 — ry if (32) holds, following 
Proposition 3. Assume Es holds for some 0 < <5 < 0.6376 and A > 1/M, then 


\\i-w\\< 


I- 


Wi 0 
0 W 2 


0 Wg 

Wg 0 


< max ||/ — Wi| 

i=l,2 


\Wa 


< 0.3623-f (5 < 1, 

which yields that W is invertible under Es- Equivalently, under Es the solution to (23) exists. Write W~^ 

Li Ri 


as 


w-^ = 


^9 ^9 


Eg Eg 


L 2 R 2 

where Li,Ri G y = 1 ^ 2 , Lg,Rg € Lg,Rg G We can then invert (24) 

and obtain 


Ctl 


Ml 

V\K" (0)l/3i 


0 

02 

M2 

Lvi^" (0)1/32J 


0 


(33) 


which gives 


OLl 




= LiUi + LgU 2 and 


0L2 

[.J\kEJ^\(32\ 


— EqU\ 4- Li2U2- 


4.4 Bounding the Dual Polynomials 

The rest of the proof is then given (33), we need to verify that |P (r)| < 1, Vr ^ Ti and similarly, |Q (t)| < 
l,Vr ^ T 2 . Since the expressions for P(t) and Q{t) are very similar, it is sufficient to only establish the 
above for P(r). 

Recall the form of P (r) in (21), the 1th derivative of P{t) can be represented as 

Ki Ki K 2 K 2 

P^^'’ ('t) = X] “ Dfc) + X {t - Pk) + X (^2kK^g'> (r - T2k) + X l^‘2kKg~^^'' (t - T2k) , 

fc=l k^l k^l 

(34) 


13 













































which can be rewritten as 

Ki 


Ki 


1 


—(r) = au- , ^ (r - n^) + ^ YW^lPik ^ _ 

V\K^\ k=i V\K^\ k=i Y\K^\ 

K2 , K2 


■l + l 


^(i + 1) 


1 

E ^ (r - T2k) + E YW^\fi2k-^ _ 

^Ak^\ YW 


1 


= vn M 


CKl 

vT?^^/3i. 


("t) 


0:2 

vT?^^/32. 


a+i) 

(35) 


where 


vii (t) = 


vW' 


K<^i) [j _ Til) 

(r - ri2) 

X(0 (t _ Tii^J 

1-K(i+i) (r-m) 

1 (t _ T 12 ) 

Vl*^"(0)l 

1 j;^(i+i) _ Tiifj) 

Vl*^"(o)l 


, V21 (t) = 


vW' 


K^P {t - T21) 

^ {t - T 22 ) 

kP (r - T2K2) 
-(Z + 1) 


\/l^"(o)l 

1 

y/\K"m 


Kr > ir-r2i) 

(t - r 22 ) 


^/\K"m 


{T-T2K2) 


and K" (0) is the scaler defined in (25). Using the forms of K{t) and Kg^r), we can rewrite the above as 


Vu (t) = ^ V (n), 


n=-2M 

2M 


V 2 I 


(r) = 4 E I e-^' 2 "”"e 2 (n) 


n=-2M 


where ei(n) and 62 ( 71 ) are defined in (30). Then ^ r (r) can be rewritten as 

V\K"m 

-p(') (r) = (t) (Titti + LgU2) + V21 (r) {LgUi + L2U2) 


VWW\ 


= (tti, if Vi; (r)) + {u2,LgVii (r)) + {ui,LgV2i (r)) + (m 2, if ^2; (t)), 


(36) 

(37) 


where (36) follows from (33). Let 

= E [W] = 

and 


■E[TUi] E[Wg] 


'Wi 0 ■ 

E[Wg] E[W2]_ 


0 W 2 


1 _ 

(TUi- 

0 


i/il 

Rfii 

0 

0 


0 

W^2-\ 


0 

0 

Lfi2 

RtJ,2_ 


where i^^ G £ 2 KixKi ^ £ 2 KixKi^ i = 1,2. We can then further rewrite (37) as 

(r) = (Mi,ifir;i; (r)) + {ui, (ii - vu (r)) + {u2,Lfvii (r)) 




+ (Wl, (t)) + (m 2, i2 ■W2i (r)). 


( 38 ) 


Denote 


VW^\ 

Our proof proceeds in the following steps: 


jPP M = {ui,LPvii (r)). 


14 




























































Step 1 : show that 


y/\K"iO)\' 


.pil) (^r) is bounded around 


y'\K"{0)\ 


(t) for a set of grid points TgHd; 


• Step 2 : show that 


(r) is uniformly bounded around 


Vl^"(o)l 

Step 3: finally, show that |P(t)| < 1, Vt ^ Ti. 




(r) for all r £ [0,1); 


4.4.1 Proof of Step 1 

Here the goal is to bound the last four residual terms in (38) with high probability on a set of uniform grid 
points T £ Tgrid from [0,1) whose size will be specified later. We first record the following useful lemma 
whose proof is given in Appendix F. 

Lemma 1. Under the event Eg for some S G (0,1/4], we have 


<2 w: 


for i = 1, 2, 

r-ll " 


\L,,-L^,\\<2\\W-^\\‘S, for f = l,2, 


\Lg\\<2\\W^ 


-1 I 


■(5 < 0.8||1P"M 

— M PI 


|i:gll<2||ip-if <5<0.8||1P-1||. 


When the signs of the coefficients Oik’s are arbitrary, the last four terms in (38) can be bounded by 
(Ml, (Li - L^i)^ vu (r))| < IIM1II2 ||(^i - vu (r)||^ < Ci v^< 5 , 

|(M 2 ,LfMi/ (r))| < IIM 2 II 2 (r)||.^ < C 2 '/^S, 
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\{ui,Lg V2I ( t ))| < ||Mi|l 2 ||Lg V2I ( t )|| 2 , 

\{u2,L^V2i (r))| < ||M2||2||LfM2; (r)||2, 

where the last steps of the first two inequalities follow from Lemma 1, and ||mi; (t )||2 < C for some numerical 
constant C [23, Lemma 4.9]. By setting i5 properly and we can obtain the bound on M using Proposition 3, 
Lemma 4.6 and Lemma 4.7 in [23]. When the signs of the coefficients aik’s are random, we can provide a 
tighter bound by applying the Hoeffding’s inequality, which follows similarly as the proof of [23, Lemma 4.8 
and 4.9]. We have the following proposition. 

Proposition 4. Suppose A > 1/M. There exists a numerical constant C such that 


M > C max / log 


^ grid I 
V 


Krr 


■log 


- grid I 
V 




■log 


KI + K 2 

■n 


or additionally, if the signs of the coefficients aik’s are i.i.d. generated from a symmetric distribution on the 
complex unit circle, there exists a numerical constant C such that 


M > C max f ^ log 


|T,h 


grid I 


where JTgridl is the grid size, then we have 


sup 

■Td CTTgrid 


1 


(mi, (Li - Lgi)^ vii {Td)) 


:fog 


K 1 +K 2 

1 


log 




grid I 


<e, I = 0,1, 2, 3; 


sup ](M 2 ,Lg Mii (rd))] < e, Z = 0,1,2,3; 

^ grid 

sup |(Mi,L|^M 2 i (Td))| < e, Z = 0,1,2,3; 

'^d ^ "^grid 

sup I (m 2 , L 2 V 21 {Td)) I < e, Z = 0,1, 2,3, 

"Td CTCgrid 


hold with probability at least 1 — Srj. 
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Denote the event 


D = l 


sup 

I Td^Tgrid 


VW‘ 


rP^‘'> (Td) - 


^/W^\ 




< 3 , / = 0,1,2,3 


for some e > 0. Then by rescaling the constants, it is straightforward that £i holds with probability at least 
1 — ?7 as soon as the conditions in Proposition 4 are met. 


4.4.2 Proof of Step 2 

We have shown that the differences between — , ^ ,, , ........ ._, 

grid. In this step we extend this statement to the continuous domain r G [0,1) by assigning the size of Tgrid 
properly. This is given in the following proposition whose proof is given in Appendix G. 


(r) and 


P® (r) are bounded on a finite 


Proposition 5. Suppose A > 1/M. There exists a numerical constant C such that 


M > C max < log 


2fM(Ki+K2)\ 1 




er] 


^A'maxlog 


M {Ki + K 2 ) 

e?7 


:^^Lxlog 


K 1 +K 2 

1 


or additionally, if the signs of the coefficients Uik’s are i.i.d. generated from a symmetric distribution on the 
complex unit circle, there exists a numerical constant C such that 


^ j 1 , 2 fM{Kl+K2) 

M >C max <j — log ' ^ ’ 


er] 


I o A"max log 


K 1 +K 2 

1 


log 


M {Ki + K 2 ) 

er] 


then we have 




,p(') (r) - 


vwm 




< e, Vr e [0,1), Z = 0,1, 2, 3 ^ > 1 - 77 . 


4.4.3 Proof of Step 3 

This step follows essentially the same procedure as those in [23, Lemma 4.13 and 4.14], where we divide 
[ 0 , 1 ) into 

TLar = = ufi, [nu - Ts, Tik + T.] , and TL, = [0, 1 )\T[, 3 ,„ (39) 

for i = 1,2, where Tg = 8.245 x 10“^/M. Then conditioned on the event in Proposition 5 one can bound 
|P(r)| < 1 in and respectively following straightforward calculus. We shall omit the details 

and refer interested readers to [23, Lemma 4.13 and 4.14[. We have the following proposition. 


Proposition 6. Suppose A > 1/M. There exists a numerical constant C such that 


M > C max < log 


2 fM{Ki + K2) 


1 


K 

1 -*'-max 


log 


M (ATi + K2) 


1 


• ^max log 


Ad + K 2 

V 


or additionally, if the signs of the coefficients Uik’s are i.i.d. generated from a symmetric distribution on the 
complex unit circle, there exists a numerical constant C such that 


M > C max < log 


2 fM{Ki^K2) 


K 


log 


K. 


log 


M (Ki + K2) 


then we have 


\P iffil < 1 - CpM^ {t - Tik) <1, T e T„^dr\ {pfc} , fc = l,...,A:i, 


|P(r) - sign(aifc)| < G/M^ (r - TikY 


T € 

^ near’ 


k = l,...,Ki, 


|P(r)|<l-G;<l, reXL, 

with probability at least 1 — r], where Cp, Cp and C” are some positive numerical constants. 
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4.5 Finishing the Proof 

The proof of Theorem 2.1 is now complete since we have established that P{t) and Q{t) constructed in (21) 
and (22) are indeed valid dual certificates under the condition of Theorem 2.1. 


5 Proof of Theorem 2.2 

We first provide a proposition on optimality conditions of (9), which is proved in Appendix H. 
Proposition 7. {xi,X 2 } is the minimizer of (9) if and only if the following holds: 

\\y- {xi + 9 0X2)11^ < A^, 

||g 0 (y - (xi + gr 0 X2))\W < K, 

{y- {xi +gQx2),xi + y0X2)R = ||xi||_4 +A^ 11*2IU- 

Let ei = Xi — x\ and 62 = X 2 — x^. Moreover, let vi and V2 be the corresponding representing measures 
[18, 37] of ei and 62 , respectively, which are given as 


ei = / c (r) vi {dr) , 62 = / c (r) V2 (dr). 

Jo Jo 


Therefore, we have ||ei ||_4 = ||^'i||TV, ^ where || ■ \\tv is the total variation norm of the representing 

measure. Define 


rk _ 

^i,o — 


{dr) 


= (4M + 1) / (r - Tik) Vt {dr) 

A near 

- [ (t- nuf \vi\ {dr) 


.fe (4M+1) 

L.2 - 2 


and lij = foi' 3 = 0) and z = 1,2, where and are defined in (39). We have the 


'■,J A^k — 1 

following proposition whose proof can be found in Appendix I. 


Proposition 8. Assume the noise is bounded as ||tn ||2 < cr^. Set An, = CwO’w'f + 1, for some constant 
Cm > 1 large enough, then we have 


T ||2 + ||e2||2<vWTT^ I + 


.3 > 


3=0 


|ei +5 062112 < 


\ 


2 / 


2 \ 

2 A^^ 

.=1 V 

PriJvi) 

+ J 

3=0 J 


(40) 

(41) 


Hence 


the rest is to provide an upper bound on the term ^i =0 

the following proposition to control the sum value of zeroth moment terms X]i=i ^*.0 a-nd the sum value of 
first moment terms dii,!, whose proof is given in Appendix J. 

Proposition 9. Under the conditions in Theorem 2.2, there exist some numerical constants Cq and Ci, 
such that 


Ii,o < Co ( A^ 


i=l 


log M 

max 


M 


■^C.2 + ^ 




TV 
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h,i < c'l j Au 


i=l 


^maxlogM 


M 


^^2 + ^\Pr\^^{.Vi) 


TV 


What remains is to bound y^,_i Px^ {^i) + h 2 > which is given in the following proposition 

^ t—A far ^ ' TV ^—-*- ’ 


2=1 2=1 

/or z = 1,2, with high prohahility given in Theorem 2.2. 

Douna 2 .^^ ” ^ 

proved in Appendix K. 

Proposition 10. Under the conditions in Theorem 2.2, there exists a numerical constant C, such that 

2 .. 2 
TV 




^maxlogM 


M 


i=l i=l 

holds with high probability given in Theorem 2.2. 

Combining Propositions 8, 9 and 10, there exists some constant C such that 

1 /II II II II , . ^ A^max'x/iog A^j 


V4M + 1 


|ei|l2 + ||e2||2) < C- 


^/M 


< Ciay^VKYjogM, 


and 


V4A/ + 1 

6 Conclusions 


ei + 9 0 62112 < 


||2< ^ <C2an, 

" V4M + lV '/M ~ \ M ) 


1/4 


We propose a convex optimization method based on atomic norm minimization to super-resolve two point 
source models from the measurements of their superposition, where each point source signal is convolved 
with a different low-pass point spread function. It is demonstrated, with high probability, that the point 
source locations of each modality can be simultaneously determined perfectly in the noise-free setting, from 
a near-optimal number of measurements when each point source signal satishes a mild separation condition, 
and the point spread functions are randomly generated in the frequency domain. The proposed algorithm is 
also robust in the presence of bounded noise. 

Our algorithmic framework and the proof methodology can be extended straightforwardly to handle more 
than two modalities when all of the modalities obey the conditions set forth in the current paper. There are 
a few possible future research directions. In applications such as multi-user detection, only a small number 
of users are active out of all the possible users. It will then be of great interest to simultaneously identify a 
small set of active users as well as identify their corresponding point source signals. In addition, it will also 
be of interest to develop performance guarantees of the proposed algorithm under milder conditions of the 
point spread functions, for example when they are deterministic but weakly correlated. 
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A Useful Lemmas 


Lemma 2. [38, noncommutative Bernstein’s inequality] Let {En\ he a finite sequence of independent, ran¬ 
dom matrices with dimensions di x ^ 2 - Suppose that each random matrix satisfies 


E [LJ„] = 0, and ||-E„|| < R almost surely. 


Define 


Then for any t > 0, 



Lemma 3. [39, Bernstein’s polynomial inequality] Suppose F (z) is a polynomial of degree N with complex 
coefficients, then there exists 

sup \F'{z)\ <N-snp |i^(z)|. 

Id<i ld<i 


Lemma 4. [fO, Hoeffding’s inequality] Let the components of u G be sampled i.i.d. from a symmetric 
distribution on the complex unit circle, w G , and t be a positive real number. Then 


’ {\{u,w)\ > t} < Ae 


B Proof of Dual Problem (15) 


The Lagrangian function of (8) is given as 

L{xi,X2,p) = ||a:i||_^ + ||x2lU + {P,y - - g Q X2 )r, 

whose infimum over xi and X 2 can be found as 
D{p) = inf L{xi,X 2 ,p) 

Xi,X2 

= inf {||a:iU - (p,a?i)R + ||a;2|U - (p,g0a;2)R + (p,y)B} 

= inf {\\xi\\^-{p,Xl)u+\\x2\\J^^-{gQp,X2)R-\-{p,y)R} 

Xl,X2 

= inf{||a:ilU - (P^ ®i)r} + inf{||a:2lU - id & P, a:2)R} + (p, y)R- 

Xl X2 


Plugging into (42) the facts that 

• f/ii II / \ \ / 1 

for * = 1,2, we can have the dual problem of (8) as given in (15). 


(42) 


C Proof of Ti C ti and T 2 C t 2 

If Ti\Ti ^ 0 or T 2 \T 2 0, there exists \P (t) | < 1 for r G Ti\Ti or \Q (r) | < 1 for r G T 2 \T 2 - Then we 

have 


(p, y)R 


(p, xI)r +{p,gQx2)R 

Ki K2 

(p, ^ OifcC (rifc))R + (p 0 p, ^ a2kC (r2fc))R 
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= ^ Re (^aikP (rifc)) + ^ Re ^aifeP (rifc)^ 

TifceXinti rifcGTi\ti 

+ Re ^a2fc(3 (T2fe)^ + Re ^a2fcQ (x2fc)^ 

'7'2fe£’^2nT2 T2fcGT2\T2 

< E l^lfcl “1“ ^ ^ l^lfcl “1“ ^ ^ |^2fc| ^ ^ |^2fc| 

TlfeGTinTl rifceTl\Tl T2fceT2nT2 T2fcGT2\T2 

= ll=^tlU + ll*2lU, 

where the strict inequality violates strong duality. Therefore, Ti C Ti and T 2 C T 2 . 


D Proof of Proposition 1 

Proof. Since 


llPll.4 = sup 

rG[0,l) 


2M 


E 

i^-2M 

2M 


J27rnT 


= sup |P(r)|<l, 

tG[0.1) 


19©PIU= sup 

tG[0,1) 


E 

n.^-2M 

the vector p satisfying (18) is dual feasible. First, 

(p, y)R = {p-,x\ + gO) a; 5 )R 

= (P, 3:^)r + (gOp, a:2)R 

isTi / 2M 

= ^ Re oife 


9nPn^‘ 


j27rnr 


= sup |Q(t)| < 1, 
Te[0,l) 






k—1 \ n——2M 

Ki K2 

= ^ Re (aifesign (oia,)) + ^ Re (a 2 fcsign (a 2 fc)) 

k^l k^l 

Ki K2 

= Ei“ifci + Ei«2fci>ii*iiu + ii=^2iu- 


\ K2 / 2M \ 

+ ^Re a2fe ^ g„P„e^'2™ 


1^-2M 






Also, we have 

(p, p)r = (p, ®i)R + (g 0 p, a;^)R < IIpII^ IU ' 


'0pii::iii*2iu<ii*iiu + ii=^2iu> 


which gives {p,y)s. = ||aJi ||_4 + ||® 2 ll. 4 - This implies that p is a dual optimal solution of (15), and that x\ 
and X 2 are the primal optimal solutions of (8). 

Now validate the uniqueness of x\ and x^. Suppose there is a different optimal solution of (8), which 


can be written as Xi = X]fc=i hkc{.Tik), where = {Tik\k = 1,.. ■ ,P'i}, and \\xi\\_^ = J2k=i \^ik\ for * = 1 , 2 , 
and it satisfies y = Xi + g Q X2- If T^ = for i = 1 , 2 , we have Xi = x\ and X2 = *2 straightforwardly. 
We then consider the case when at least Ti for some i. We have 

(p,g)R = (p,*i + g0*2)R 

= (p, *i)r + (g0P,®2)R 

( 2M 
aik ^ Pne- 

n=-2M / fifcGti\Ti 


j27rnfifc 


\ / 2M 

+ E Re difc ^ 


Vne- 


,j27rnfifc 


t=-2M 

2M 


i=-2M 


+ 


X! Re ~^ 2 k Y. 


9nPn^' 


,j27Tnf2k 


\ / 2M 

+ E E 


PnPn 


,j27rnf2fc 


T2fceT2nT2 


n=-2M 


T2fc€T2\T2 


n=-2M 
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< ^ |aife|+ ^ Re (aifeP(fifc)) + ^ |a 2 fe| + ^ Re (a 2 feQ(f 2 fe)) 

TlfeGTlDTi '7"2fc £’Y'2nT2 T^2fc^'i'2\Tl"2 

< Z! i«ifei+ Z z |o2fc| + |a2fc| 

'r2fc£’^'2nT2 '7^2fc^’i'2\"f'2 

= ll*llU + ll*2|U, 

which violates strong duality. Thus {x\,X 2 ) is the unique primal optimal solution of (8). □ 


E Proof of Proposition 3 

Proof. To apply Lemma 2 to (29), we first bound \\En\\ as 

1 


i£;„ii = 


s„ 5 „ei(n)ef (n) 


M 


= l»J Jk, + (2^nf 


< ^ , max \sn\] \/KiK 2 ( 1 + 

M \|n|<2M 7 V 

< 14 ' := R, for M > 4, 


(27rn) 
|nr<“2M |iL" (0)1 


max 


where max|„|< 2 M |s„| < 1, and (^1 + max|„|< 2 M \k^^I)\ ) = 1 + ^ 14. for M > 4 [23]. 

Furthermore, 


2M 

= 

2M 

Z E 

^SnQnei (n) ef (n) • ^s„g„e 2 (n) ef (n) 

n^-2M 


n——2M 

L J 


2M 


Z ^Snl ^2 I 1 + I ei (n) ef (n) 


i=-2M 


|K"(0)| 


1 T.- A (27rn)^ \ 

- i^ts, pw j Ut<‘« 

<liK.II»',ll< 20 §, 


2M 


Z 


i=-2M 


where the last inequality follows from (27). Similarly we can obtain En=- 2 M®" 


< 20-^. Hence, 


a = max • 




ZlE[£;«K 

n 

Apply Lemma 2, for 0 < 5 < 0.6376, then we have 




M ’ 


ni|Wg||><5}<2(iLi+iL2)exp 


-(572 


20 I Mi 

jV/rVmax -I- 3 ^/ 




< 2 (ATi + A: 2 ) exp ( - 


5‘^M 


46iL„ 


< V, 


if M > || ATmax log ( 


2(JCi+/ir2) \ 

>7 7 


(43) 


□ 
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F Proof of Lemma 1 

Proof. For both invertible A and B that satisfy ||A — B\\ ||B~^|| < 1/2, it has [23] 

||A-^|| < 2 ||B-^|| , and |||| < 2 ||^ ||A - J5|| . 

Applying the above to A = W and B = W^, from (28), we have ||W/7^|| < 1.568. Under the event £s, 
||FF — W^ll = II Wgll < S. Therefore as soon as 5 < i < || ^_i || we have 

^ 11 fj, 11 

||W|| <2||W-i||, 

||VF-i _ w-^\\ < 2 ||W--if ||TU - TU^II < 2 ||TU-if 5. 

Finally, because the operator norm of a matrix dominates that of its submatrices, we have 

\\L,-L^,\\<2\\W-^\^S, for i = l,2, 

\\Lg\\<2\\W-^^S, 

\\Lg\\<2\\W-^^6, 

and ||ili|| < ||Ti —T^ill + \\Lfj,i\\ < 2||W'~^||^5+ ||W'“^|| < 2||W'“^|| for i = 1,2 where we have used 
II W-iII < 1.568 and (5 < 1/4. □ 


G Proof of Proposition 5 

Proof. Conditioned on the event £s with S G (0,1/4], we have 

^ (r) 

< \{ui,L^Vii (t))| + \{u2,L^Vii (t))| + \{ui,Lfv2i (r))| + \{u2,L^V2i (r))| 

< ||«i||, IITill llm, (r)|], + ||«2||2 ll^^.ll ll^u (r)||2 + llt^ill^ lli^ll 11^2^ (r)||2 + ||«2|l2 11^211 11^2^ (r)|]2 

< /aT • 2 II W-i II • (4M + 1) ^4'+i • 0.8 || W-i|| • (4M + 1) ^4'+^ 

+ • 0.8 IIW-ill • (4M + 1) ^4'+i v^+ • 2 ||W-i|| • (4M + 1) ^4'+^(44) 

< C (Ai + K 2 ) , 


for some universal constant C. In (44), we applied Lemma 1, ||Mi ||2 = for i = 1, 2, and 


hll {t)\\2 = 


2M 




< ^ (4M + 1) ( max |sn| I I max 
M \|n|<2M / \ \n\<2M 


j27rn 




max ||ei(n )|]2 

\n\<2M 


< ^ (4M + 1) 4'\/^ max \ l + 

M \n\<2M y 

< ^(4M+l)4'yi4A^, 


(27rn) 

\K^\ 


^This choice of S is not unique but good enough for our purpose. 


( 45 ) 
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and similarly 


1 


In (45) we have used max|„|< 2 M |sn| < 1, max|„|< 2 M 

14, for M > 4. 

Using Lemma 3, we have 


V2i (r)|l2<^(4M + l)4'\/l4i^. 

2212]— < 4, for M > 2 and (|l + max|„|< 2 M \ k^^I)\ ) < 


Vl^"(o)l 




(To) -^ {n) 

VWm 


j^-n-Ta _ i27rT6 I 


< - e 


sup 

-relo.i] 


r) 1 , P(0 (eP^^) 


deP"" 


ei2^r„_gi2^r, ,2M sup 

- ^(r) 

tG[0,1] 

V\K^\ 


< 47r \Ta -n\-2M-C (Ad + K 2 ). 


Note that similar bounds also hold for Conditioned on the event £s flfi with <5 £ (0,1/4], we have 


1 


vW^\ 


(t) - 


1 




(r) 


yI 


< 



^ ,P^‘UTd) 

+ 

^ .P^^Urd) 

^ , P(') (Td) 

v\K"m 

V\K" (0)1 


V\K" (0)1 

v\K"m 


+ 


vW^\ 


M- 


V\K^\ 


p(0 (7 
■i M V 


< 47r jr - Td\ ■2M-CiKi + K 2 ) + - + 47r jr^ - r| • 2M • C (ifi + K 2 ), 

O 


for any r £ [0,1], where £ Tgrid- By setting the grid size jTgridj = 
24 ^CM(V,+K.) > which yields 




uPl^^ (^) 


24Ti-CM(ifi+J<'2) 


< e. 


, we have jr^ — rj < 


By plugging the grid size and modifying the condition on M, the proof is complete. 


□ 


H Proof of Proposition 7 

Proof. Denote f {xi,X 2 ) = ^ jjy — a;i — g 0 a; 2||2 + Am (||a;i ||_ 2 j + ||a; 2 ||_ 4 ) as the objective function of (9). 
Since {xi,X 2 } is the minimizer of (9), for all £ (0,1] and all {a;i,£ 2 }, we have 

/ (o^m^l 4” (1 ^w) ^1; 4“ (4 ^w) ^ 2 ) ^ / (^1; ^ 2 ) • 

This is equivalent to the following 

(|l®i + a^,{xi - Ai)||_^ - pilU) + a~^Xyj (11*2 + aw (*2 - *2)IU “ II®2|U) 

1 2 

> (y - (*i + 9 © *2) , (*i - *1) + 9 © (*2 - *2))r - -^a^, \\xi - Xi + g Q {X2 - *2)||2 ■ 

As the atomic norm || j |_4 is convex, the following inequalities hold: 

ll*llU - PlIU ^ ( 11*1 + (*1 - *l)IU - ll*llU) > 

P2IU - ll* 2 lU ^ ( 11*2 + dm (*2 “ * 2 )IU “ P2IU) , 
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which can be plugged into the previous inequality to obtain 

>^w (ll*ilU + P2IU - PilU - P2IU) 

1 2 

> {y - {xi + g Q X2 ), {xi -xi)+gQ {x2 - X2 ))m - -^aw \\xi - xi + g Q {x2 - £C2)||2 ■ 

Set Ou, —>■ 0, we can obtain that {£1, X2} is the minimizer of (9) only if for all {aii, ^2}, there exists 

(pilU + P2IU “ ll*ilU “ p2||^) >{y-{xi+gQ X2 ), (ii -Xi)+gQ) {x 2 - ®2 ))r- ( 46) 

On the other hand, if (46) holds for all {xi, £2}, we have 

1 2 

/ {xi,X2) = ^\\y-xi-gQ X2\\2 + Xw (lla^ilU + ||a:2||^) 

1 2 

= - ||y - - g 0 ®2 + *1 + g 0 *2 - *1 - 9 0 ®2||2 + (ll^ilU + P 2 IU) 

+ (||*ilU + P2IU - ll*ilU - P2IU) 

1 2 1 2 

= 2 11?^ “ *1 - 90 A2II2 + Xw (11*1 lU + ll*2||^) + 2 11*1 + 9 © *2 - xi -gQx2\\2 

+ (y - *1 - 9 0 *2,*1 + 9 © *2 - *1 - 9 © *2)r + Xiu (||*l||_4 + ||* 2 |U “ ll*llU “ ll*2lU) 

1 2 

> / (;ei, *2) + 2 11*1 + S' © *2 - a:i - 9 0 a:2||2 

> /(*1,*2) ■ 


Therefore, (46) holds if and only if {xi, X2} is the minimizer of (9). 

Furthermore, we can rewrite (46) by moving all the terms containing {i 1,012} onto one side as 

Xw (11*1 lU + ll*2lU) - (y - (*i + 9 © *2) ,*i + 9 0 *2)r 

< Xw 11*11|_4 - (y - (*1 + 9© *2) ,*i)r + Xw ||*2||^ - (y - (*1 + 90 *2) ,9 © *2)r- (47) 

Since (47) holds for all {xi,X2}, (47) still holds if taking infimum on the right-hand side with respect to 
{all,al2}- That is 

Xw (||*ilU + ll*2lU) - (y - (*i + 9 © *2) ,*i + 9 © *2)r 

< inf (Au, ||*i||^ - (y - (*i + 9 © *2) ,*i)r} + inf {A^, ||*2|U - (9 - (*i + 9 © *2), 9 © *2)r} ■ 

£Cl X2 

Plugging in the facts that 


we have 
as well as 


inf{||*ilU-(y-(*i+9©*2),*i)R} = | ^"^otherSiS^*'^ ^ ^ ,fori = l,2, 

(y- (*i + 9 ©* 2 ),*! + 9©*2 )r > Xw ||*i ||_4 +A^ 11*2 lU, 

||y- (*i+9 0X2 )||))i < Au,, and ||9 © (9 - (*i + 9 © *2 


< A„. 


□ 


I Proof of Proposition 8 

Proof. We first record a useful lemma from [37]. 

Lemma 5. [37, Lemma Ij For any 2mth-order trigonometric polynomial X(t) = {x,c(t)), we have 


X(r)i/i(dT) < ||a:||;)j ( Pr\^^{vi) + 


for i = 1,2, where Pa {vi) denote the projection of the measure Vi on the support set A. 
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Setting X{t) = (e^, c(r)) for * = 1, 2 in Lemma 5, we obtain 


= c(t) (dr)^ = y {ei,c{T)) I'i {dr) 


< l|e.O 


j=o 


< V 4 M + l||e,|l 2 I ) ’ 


j=o 


where we used the fact ||ei||;^ = sup^-giQ i) |(ei, c (r))| < ||ei ||2 ||c (t )||2 = V4M + l||ei ||2 following the 
Cauchy-Schwarz inequality. This yields the estimation error of x* in (40). For the denoising error, first 
notice that, 


|ei + g 0 e2||_4 < ||y - - g 0 x^\\^ 

< 2 A„, 


\\y-xi - g0X2|| 


(48) 

(49) 


where (48) follows from Proposition 7, and (49) follows from = sup,.g[Q |(ta,c(T))| < ||m|l 2 ||c(t )||2 < 

(TujV 4M + 1 = Auj/Cu, < Xw Similarly, we have ||g 0 i(;||))j < Xw/Cw and consequently ||g 0 ei +e 2 ||()^ < 
2A,,,. Therefore, we have 


|ei + g 0 62112 = (ei + g 0 ^2,61) + (g 0 ei + 62,62) 

= (ei+g0e2, / c (t) (dr)) + (g 0 ei + 62 , / c(r)z/ 2 (dT)) 

JQ JQ 

= / (ei +g 0 62,6 (t))i/i (dr) + / (g 0 61 + 62 ,c (t))^2 (dr) 
Jo Jo 


< ||ei + g0e2| 


M J,y+j2 J I + II 9 0 ei + 62 11 ^ I (i^2) ^ 


i =0 


TV 


^ 2 j 


3=0 


- 2 A^ ^ I PriJ^^) 


i=l 


3=0 


where we used (49) in the last inequality. 


□ 


J Proof of Proposition 9 

We first construct a pair of trigonometric polynomials Pi (r) and Qi ('’’) with the following properties whose 
proof can be found in Appendix L. 

Lemma 6 . Assume that gn = ’s are i.i.d. randomly generated from a uniform distribution on the 

complex unit circle with (fn W[0,1]. Provided that the separation A > 1 /M, there exists a numerical 
constant C such that as soon as 


M > C max < log 


M{Ki+K 2 ) 


, ATinax log 


M{Ki+K 2 ) 


.-K^maxlog 


Ki + K2 

V 


we can construct Pi (r) = Y,n=-2M and Qi (r) = J2n=-2M that satisfy 

\Pi {t)- sign (aik) (t - Tik)\ < CpM {t - Tikf , r G fc = l,...,A:i, 
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|Pi(r)|<^, reXL, 


\Qi (t) - sign (a 2 fe) (r - T 2 k)\ < CgM (r - T 2 k) , t e k = l,...,K 2 , 

C' 

M’ ^ 


|Qi(r)|<^, reT? 


with probability at least 1 — r], where Cp, Cp, Cq and Cq are numerical constants. 

Furthermore, we derive the following useful lemma in Appendix M. 

Lemma 7. For P{t) and Q(t) constructed in Proposition 6, and Pi(t) and Qi{t) constructed in Lemma 6, 
there exist numerical constant C and Ci such that 


j\{T)v,{dT)+ j\{T)v2{dT) 

f Pl{T)v,{dT)+ f Q,{T)v 2 {dT) <C,\ ./^-axlogM 

JO JO 


(50) 

(51) 


with high probability given in Theorem 2.2. 
Proof. Consider the polar form 


.{dr) 


— ^-JPik 


/ Vi {dr), i = l,2, 

^ A near 


then we can construct a pair of dual polynomials P (r) and Q (r) that interpolate a pair of point sources 
with sign(aifc) = as in Proposition 6. Therefore, we have 


ifi 


■^1.0 = 


k^l 

Ki 


f i.fc 


Vi (dr) 


k—1 k—1 near 

« Ki p 

/ P (r) i/i (dr) — P (t) vi (dr) + — P (r)) (dr) . 

^ dxj r—1 


Similarly, 


fc=i • 


K2 


h.o = 


[ Q (r) V 2 (dr) - [ Q{t) V 2 (dr) + V' / (e - Q (r)) 1/2 (dr). 
do dxf„ ^ dxJi 


Now consider their sum, then we have 
2 


/■! II 

^It,o< / P(r)^i(dr)+/ Q(r)j 22 (dr) P-fi^_^(j 2 ,) 

i=l do do 

Ki „ K 2 . 

+ X! / ife C'p^^ (x - Tikf \vi\ (dr) + X! / 2 CqM^ (r - r 2 fc)^ |j^ 2 | (dr) 

k—1 '^^xiear fc=l 

/*1 /*1 ^ II ^ 

/ P(r)j/i(dr)+ / Q(r)i^ 2 (dr) + V Nxi ( 12 ,) +C 2 VA ,2 

do do xy ^ 


< cUS:^ + E l^r.. (-■)||^, + ft E X., 

2=1 2=1 


M 


(52) 

(53) 

(54) 
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where (52) follows from the triangle inequality and the properties of the dual polynomials in Proposition 6 , 
(53) follows from the definition of 2 , and (54) follows from Lemma 7. 

Then, we consider bounding X]i=i -^*1 ^ similar way. Again, consider the polar form 


(r - nk) Vi [dr) = e 


/ (r - Tik) Vi (dr ), 

Jri’h. 


* = 1 , 2 , 


then we can construct a pair of polynomials Pi (r) and Qi (r) in the form of Lemma 6 by letting sign (dik) = 
g-jpik _ Then we have 


h. 


AM 


h—\ ^ ^ near 

Ki . Ki 

= X! / 1 ,, (r - Tifc) - Pi (r)) vi (dr) + J2 

^•—1 ^near k—1. ^near 

= ^ ^ (r - Tifc) - Pi (r)) (dr) + f Pi (r) i/i (dr) - f Pi (r) z/i (dr) 


and 


-I 2.1 


4M+ 1 


^ X! / 2 fc ~ + f Qi (r) 1/2 (dr) “ / Qi ('^) *^2 (dr) . 

k^l^Tnear dO •* 


Taking their sum, we have 

2 ,i 


^4i < (4M + 1)( 


1 

Pi (r) (dr) + I Qi (r) 1/2 (dr) 


0 


Ki . 

+ X! / 1 (r - rife) - Pi (r)| \vi \ (dr) 

k=l dfniar 


K 2 


+ ^ /s, I® (T-'r 2 fe)-Qi (t)| |j^ 2 | (dr)) +C3^ ||PTj^^(j 2 i) 

»- 1 *^T^"near - — 1 

7L3 ,, log M + C 3 ^ ||Ptj„ (v.) 




2=1 


TV 




M 


TV 


where the first inequality follows from the triangle inequality and Lemma 6 , and the last ineqnality follows 
from Lemma 7, the definition of Ii ^2 cind Lemma 6 . □ 


K Proof of Proposition 10 

Proof. Let iii and u* denote the representing measure of Xi and a:*, then we have Vi = Ui — u*., i = 1,2. 
Since ||u*||ry = ||a^)'IU IImiUtv = ll^ilU, from Proposition 7, we have 

llwillTy + ||'«2 ||tv = ||ii|U + ll^zlU 

= -^{y - {xi+gQx2),ei+gQ e2)B 

+ {xi + 9 0 * 2 ) ,®t)R + -^{y - {xi +gQx2),gQxl)R 

Auj A-u) 

< -^{y - (®1 + 9 0 *2), ei + 9 0 e2)R + ||a;tlU + II^^^IU 

'^W 

= ^(®i + 9 0 ®2 + *** - (*1 + 5 © *2), ei + 9 0 e2)R + ||ict||_4 + H^JIU 
Aw 

1 2 1 

= — — ||ei + 9 0 62112 + ^1 + 9 0 62)8 + llu^llyy + ||u2||j.y 

Aw Aw 
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( 55 ) 


^ IKiIItf + IKJIIry + 1ei + g 0 e2)R| . 


Then the last term in (55) can be bounded by 
|(■m,el + 9 0 62)1 < |(tn,ei)| + \ {gQw,e 2 )\ 


{w, c (t))i^i (dr) 


{gQw,c{T))v 2 [dr) 


< lltnil 


H I + 119 0 ^11^ I + J2 | > (56) 


3=0 

^ 7^^ I (^®) + dj, 

Cw ^ \ TV ^ 


TV 




2=1 


(57) 


3=0 


where (56) follows from Lemma 5, and the last inequality (57) follows from ||m||^ < Xw/C^ and \\g 0 'w\\*y^ < 
Xw/Cw Moreover, since 


+ llTy “ W'^i ~X ^iIItV — IItV ll-fTi (yi)\\rpy + ll+V? (i^i)! 


TV ’ * 


plugging this and (57) into (55), we have 


^\\Pr^{vi)\\Tv X] X! ( + 


2=1 


2=1 


2=1 


TV 


(58) 


3=0 


Set P{t) and Q{t) as a pair of polynomials that interpolate the conjugate sign of Pti{vi) and Pt^{v 2 ): 
respectively, whose existence is established in Proposition 6, then we have 


'^\\PTi{v,)\\,^y= j P{T)PT,{vi)(dT)+ f Q{T)Pr^{v2){dT) 

i=i do Jo 


< 

f P{T)vi{dT)+ f Q{T)u2{dT) 

+ 

/ P{T)vi{dT) 

+ 

[ Q{T)v2{dT) 


JQ Jo 


Jyc 


Jr- 


(59) 


< CX^ 


Km^y.'^OgM 


M 


P{t)vi (dr) 


Q{T)v2{dT) 


where the first term in (59) can be bounded using Lemma 7. For the second term in (59), according to the 
properties of P{t) established in Proposition 6, we have 


P{T)vi[dT) 


Ki 


P{T)vi{dT) + Y^ 




P(T)i/i(dr) 


ifi 


0 E 

fc=i 

Ki 


'TiiAtnfc} 


P{t)vi (dr) 


P(r)z^i(dr) 


Ki . 

^E / 

t^idrl’JlMrik} 

Ki 


|P(r)| \vi\{dr) + (1 - Cb) f \vi\(dr) 

drl^ 

(l - CM'^ {t - Tikf'j \vi\{dT) + {1 - Cb) j ^ Wilidr) 


= E / . Wilidr) + f Wi\{dT) 

t~idri’Jlr\Cik} Jrl^ 
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Ki 


-c'^j {t - Tikf \vi\{dT) - Cb [ \vi\{dT) 


< ||Pyc (l/i)||^^ — Call,2 — Cb Pxi„(^l) 


TV 


(60) 


for some positive constants Ca and Cb- A similar bound holds for the third term. Putting together, we have 

2 2 2 I - 

E - E 11^^^ ("*)IItv > E + Cb (61) 


which combined with (58) yields: 

^E I ^,, + E^*.. 


CX„ 


TV 


i=o 




M 


Z 

- E {CaC,2 + Cb Pri^^iyi) . 


The proof is finished by reorganizing terms and plugging in Proposition 9, for a large enough constant 

> 1 . □ 

L Proof of Lemma 6 

Here we constructed the pair of polynomials Pi (r) and Qi (r) using the same techniques as the ones in proof 
of Theorem 2.1. Recall the definitions of K (r), Kg (r) and Kg (r) in (19) and (20), and we construct two 
polynomials Pi (r) and Qi (r) as 

Ki Ki K2 K2 

Pi (t) = ^ 6>ifcR: (r - n^) + ^ (r - dikKg {t - T 2 k) + E '^ 2 kK'g (r - T 2 k ), (62) 

k=l k=l k=l k=l 

and 

Kx Ki K2 K2 

Qi {t) = E (r - Tik) + ^ '4’ikK'g ij - Tik) + ^ 92kK (r - T2k) + E '^ 2 kK' (r - T2k), (63) 

k=l k=l k=l k=l 


where Tik S Ti and T 2 k G ^ 2 • Set the coefficients 6i = [On , 
solving the following set of equations 


iKif', tpi = [ipii,. ■ - ,ipiKi]^, for i = 1,2 by 


"Pi('rifc)=0, Tifc G Ti, 

Pi (Tik) = sign(aife), Tik G Ti, 

Ql (T2fc) = 0, T2k G T 2 , 

(T 2 fc) = sign (a 2 fc), r 2 fc G T 2 , 

which can be rewritten into a matrix form as 


Wi 


10 


fWi 


w, 


Vl^"(0)|'^“ ~ -T;k^^92 

W-aQ - W-gi W 20 - W 21 

~W^\^92 -;y=^W21 -|^7^W22 


ei 

^\K" (0)|rAi 

62 

,V\K^\xl22. 


0 

1 


y/\K"m 

0 

1 

' y/\K"m 


Ui 


U2 


whose left-hand side matrix is the same as that in (24), called W, where K" (0) is the scaler defined in (25). 
Therefore, following Proposition 3, under the event Es, W is invertible, which gives 


01 


[VW^\'^i\ vw 


1 


[RlUl P Rg U2) , and 


02 




1 


[RgUl + R2U2) ■ 
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And further we know 


vli?^ 


P® (r) = - 


VW‘ 


v^i (r) (Hitti + RgU2) - 


VWW\ 


V2I (t) {RgUl + R2U2) ■ 


Under this choice, we will establish that Pi(t) satisfies the properties in Lemma 6 , and Qiir) will follow 
similarly. Denote 


1 




(^) = - 


1 


V\KPW\ 


{uuRgiVu (r)), 


then it is straightforward to obtain the following proposition to bound the distance between 


and 


P^l (r), following essentially the same proof of Proposition 5. 




pF^ (r) 


Lemma 8. Suppose A > 1/M. There exists a numerical constant C such that 


M > C max log 
then we have 


2 r M{Ki+K2) 

V n 


1 


jPF^ (t) - 


V\K^\ 


-^Ku 


A‘i (-) 


:l0g 


M (Jfi + K 2 ) 
■q 


— 


:l0g 


K 1 +K 2 

1 


vW^\ 


< 


4M + 1 


, Vre [0,1), Z = 0,1,2,3 > 1-r;. 


When T G since \P^i (r)| < for some numerical constant C [41, Lemma 2.7], under the event 

in Lemma 8, we have 

Ifl Ml < IP,. Ml + < A 

for some numerical constant Ci. Next consider |Pi (r) — sign(aifc) (r — rifc)| when r € Without loss 

of generality, assume Tik = 0. Denote Z (r) = sign(aifc)r — Pi (r) = (r) + jZj (r), where Zji (r) and 

Zj (t) are the real part and the imaginary part of Z (r), respectively. Thus we have Z^ (0) = 0, Z/j(0)=0, 
Zf (0) = 0, and Z/ (0) = 0. Similarly define (r) = sign (aik) r - P^i (r) = (t) + jZ^jt (r), where 

and Zgj (r) are the real part and the imaginary part of Zg (r), respectively. Since | (r) | < 

CM and |Z"j(t)| < CM for some constant C from the proof of Lemma 6.1 in [41], combining with 
Lemma 8, we can obtain \Z'^ (t)| < CpM and \Z'I (r)| < CpM with numerical constant Cp. Then we have 
Isign(aife) r - Pi (t)| = |Z (t)| < CpMr^. 


M Proof of Lemma 7 

We record the following lemma whose proof is given in Appendix N. 

Lemma 9. Set M > 4. There exist numerical constants Ci, C 2 and C 3 such that we have |Arg(r)| < 
Cl and \Kg (r)| < C 2 y/M log M with probability at least 1 — C 3 (A/^ log 

Proof. Since P(t) = (p, c(r)), and Q{t) = {g Q p, c(r)), we have 

[ (P,c(r))i/i (dr) + / {g(dp,c{T))v 2 {dT) 

Jo Jo 

c (r) (dT)) + {p, [ gQc (t) 1^2 (dr)) 

Jo 

= l(P,ei +9 0 62)1 
= |(P(r),P(r))| 

< 11^ Will l|ei+9 0 6211 ^, (64) 



f P (r) vi (dr) + [ Qir) U 2 {dr) 
JO Jo 
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where E{t) = (ei + g © e 2 ,c(T)) and ||i^(T)||j^ = |P(T)|dT. Here the penultimate step follows from 

Parseval’s identity, and the last inequality follows from Holder’s inequality. Therefore, we need to bound 
||P(t)||^. Recall 


CXi 




= LiUi + L„U 2 , and 


Oi2 


[./WW\f32\ 


— Lnlll + L 2 U 2 


in (33). Define 






= LniUi, and 






— Lf^2U2- 


r oLi 





y\K" mf 3 gi\ 


From [18, Lemma 2.2], we have ||a^ti||oo — ll/^Milloo — ^ some constants Ca and Cp, i = 1,2. 

Under the event £s for 0<5<l/4in Lemma 1, we have 

<||(Ll-L^l)«i|L + ||igM2|L 

< 11(1,1 - I,^i)mi|| 2 + ||I,gM2|l2 

< ||Li - X^ill ||«i||2 + IILgll IIM2II2 < 

Therefore, we have ||q:i||oo — and ||/3i||g^ < -^VR'max for some constants C'^ and Cp. Similar 

bounds hold for ||a 2 ||oo and ||/32||oo well. Then ||P(t)||j^ can be bounded as below: 


('r)lli=/ l^^('r)|dT 
Jo 

illailloo/ |/^(r)|dr + /Li||/3i||^ / |iL'(r)| dr + ||a 2 lL / [i^g (r) | dr + 11/32IL / |3^;(r)|dr 

^0 Jo Jo Jo 

< + 3^1^\/3L„,axC + K2CWKu..xCl^^-^j^ + if2^\/3^maxC2 VMlogM 


113^ 

< K 


<Cp 


K^e^x^OgM 

M 


where we used /p^|Rr(r)|dr < jiL'(r)| dr < C from [37, Lemma 4[, and |dLg(r)| < 

\K'g (r)| < C 2 ^JMlog M from Lemma 9. Plugging this into (64) and combining (49), we have proved (50). 
Next, we can write similarly that 


f Pi (r) vi (dr) + [ Qi (r) 1^2 {dr) 
JO Jo 


< \\Pi ('r)lli l|ei +g(De 2 \W, 


then it suffices to bound ||Pi (r)||j^. Recall that 


01 


y'lK" {0)\iPi\ ^\K" (0)1 

in Appendix L. Define 


Vi 


^\K" (0)|r/>^ij y^|A:"(0)| 


(iJiMi + RgU 2 ), and 


Rg_iUi, and 




^^2 






(65) 


{RgUl + R 2 U 2 ) 


zRp2U2- 


From [41, Lemma 2.7[, we have ||0/ji|[^ < Cg/M and ll'i/’itilloo — fo^' some constants Cg and C.^, 

i = 1,2. Following similar arguments as above, we have ||0i||oo < Cgy^Ki^/M and Il’j/’iHoo < , 

i = 1,2. Hence ||Pi (r)|| can be bounded as 


l|3^i ('r)lli = / 13^1 ("t)! dr 
^0 
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<^l||0l|loc [\k {T)\dT + [' \K' {T)\dT + K^We^W^ r\Kg{T)\dT + K^U2\L 

Jo Jo Jo Jo 

< + K2^V^.C2./M\ogM 


<Cp 




M3 


Plugging this into (65) and combining (49), we have proved (51). 


□ 


N Proof of Lemma 9 


Proof. Suppose M > A. For a fixed r G [0,1), applying the Hoeffding’s inequality in Lemma 4, we have 


P{|i^,(r)|>C} 



2M 

n——2M 



< 4e 


4 y^2M 1 


MC^ 

< 4e 4(4M+i) < 4e 17 , 


where we used |s„| < 1. Let Tgri^ = {t^ G [0,1)} be a uniform grid of [0,1) whose size will be determined 
later. As a result of the union bound, we have 


sup lATg (Td)| < c > > 1 - 4 ITgridl e 17 . 
^dSTgrid I 


For any r^, G [0,1), following Lemma 3 we have 

dK, (r) 


liLg (r.) - iLg (r,)| < 


sup 




< 47 r \Ta - n\ 2 Msup lATg (t)| < 407rM \Ta - n \, 


where the last inequality follows from \Kg (r)| < ig^En=- 2 M 4 \/Enf- 2 M l 3 nel 2 ™T |2 ^ < 5 ^ gy 

choosing the grid size such that for any t G [0, 1), there exists a point G Tg^id satisfying 407rM \t — Td\ < (^, 
which means we can set |Tgrid| = Consequently, for any r G [0,1), we have 

\Kg (t)| < lATg (r) - Kg (rd)| + \Kg (rd)| < 407rM |r - t^I + C < 2C, 

with probability at least 1 — 4 |Tgrid| e *17 . Choose C = ^^ave 

/511ogMl 


\Kg{T)\ < 2 


M 




> 1 - 71(M3logM)-i/2. 


Next consider I AT' (t)|. For a fixed r G [0,1), applying the Hoeffding’s inequality in Lemma 4, we have 


'{\K'g{r)\><:} = 


2M 


M 


(j2^n) 


< 4e 


n=- 2 M 

_ 

4'^2A4 •?2/'o._„'i2 

^•^n=-2M 


>c 


•< 46 32l7rM ^ 


Set Tgrid = {Td G [0,1)} be a uniform grid of [0,1) whose size will be determined later. As a result of the 
union bound, we have 


sup |A:'(Td)|<C >l-4|Tgrid|e 321, 

yT-deXgrid j 


For any Ta,r{, G [0,1), following Lemma 3 we have 


\K'g (Ta) - Kg {Tb)\ < AtT \Ta - Tb\ 2 M SUp | AT' (t) | < SStT^M^ \Ta - u] , 
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where in the last inequality we use \Kg{T)\ < jj\jT,l=- 2 M (j27rn)|^ < IIttM. 

Hence, by choosing the grid size such that for any t S [0,1), there exists a point G Tg^id satisfying 
|r — Td\ < Ci which gives |Tgrid| = . Then for any r G [0,1), we have 

\K'g{T)\ < \K'g{T)-K'g{Td)\ + \K'g{Td)\< 887T^M^\T-Td\+C<2C 

with probability at least 1 — 4 ITgi-id] . Choosing C = yJQQi'KM log M gives 

P(|Ji'g(T)| < 2A/9637rMlog> 1 - 64(M3 log 

□ 
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